xxxxxxxxxx# OverviewIn this project, we will be looking at the player data provided by FIFA which contains information such as personal details, wages, physical attributes, technical skills, potential, and positional strengths. This is primarily data of FIFA 2018. Through this project, you will get a glimpse of insights behind the beautiful game and the kind of information and decisions a football manager goes through.In this project, we will be looking at the player data provided by FIFA which contains information such as personal details, wages, physical attributes, technical skills, potential, and positional strengths. This is primarily data of FIFA 2018. Through this project, you will get a glimpse of insights behind the beautiful game and the kind of information and decisions a football manager goes through.
xxxxxxxxxx# ObjectivePreliminary Data Analysis. Explore the dataset and practice extracting basic observations about the data. The idea is for you to get comfortable working in Python.You are expected to do the following :Come up with the players' profile (characteristics of a player) of the different teams/countries.Generate a set of insights and recommendations that will help the coach to understand the competition.Preliminary Data Analysis. Explore the dataset and practice extracting basic observations about the data. The idea is for you to get comfortable working in Python. You are expected to do the following : Come up with the players' profile (characteristics of a player) of the different teams/countries. Generate a set of insights and recommendations that will help the coach to understand the competition.
xxxxxxxxxxData Dictionary:Feature Explanationplayer index number Name- name of a playerAge - age of a playerNationality - player nationality Overall - Overall Rating of the playerPotential -Potential Rating of the playerClub - The international club for which the player plays Value - The market value of the player in the transfer market## Player Skills and other self-explanatory attributesSpecialWageAccelerationAggressionAgilityBalanceBall controlComposureCrossingCurveDribblingFinishingFree kick accuracyGK divingGK handlingGK kickingGK positioningGK reflexesHeading accuracyInterceptionsJumpingLong passingLong shotsMarkingPenaltiesPositioningReactionsShort passingShot powerSliding tackleSprint speedStaminaStanding tackleStrengthVisionVolleysCAM -Center Attacking MidfielderCB - Center BackCDM - Center Defensive MidfielderCF - Center ForwardCM -Center MidfielderID - Player's ID in FIFA18LAM - Left Attacking MidfielderLB - Left BackLCB - Left Center BackLCM - Left Center MidfielderLDM - Left Defensive MidfielderLF -Left ForwardLM -Left MidfielderLS -Left StrikerLW - Left-WingLWB - Left-Wing BackPreferred Positions - Player's Preferred PositionRAM - Right Attacking MidfielderRB - Right BackRCB - Right Center BackRCM - Right Center MidfielderRDM - Right Defensive MidfielderRF - Right ForwardRM - Right MidfielderRS - Right StrikerRW - Right WingRWB - Right Wing BackST - StrikerData Dictionary: Feature Explanation player index number
Name- name of a player Age - age of a player Nationality - player nationality
Overall - Overall Rating of the player Potential -Potential Rating of the player Club - The international club for which the player plays
Value - The market value of the player in the transfer market
Special
Wage
Acceleration
Aggression
Agility
Balance
Ball control
Composure
Crossing
Curve
Dribbling
Finishing
Free kick accuracy
GK diving
GK handling
GK kicking
GK positioning
GK reflexes
Heading accuracy
Interceptions
Jumping
Long passing
Long shots
Marking
Penalties
Positioning
Reactions
Short passing
Shot power
Sliding tackle
Sprint speed
Stamina
Standing tackle
Strength
Vision
Volleys
CAM -Center Attacking Midfielder
CB - Center Back
CDM - Center Defensive Midfielder
CF - Center Forward
CM -Center Midfielder
ID - Player's ID in FIFA18
LAM - Left Attacking Midfielder
LB - Left Back
LCB - Left Center Back
LCM - Left Center Midfielder
LDM - Left Defensive Midfielder
LF -Left Forward
LM -Left Midfielder
LS -Left Striker
LW - Left-Wing
LWB - Left-Wing Back
Preferred Positions - Player's Preferred Position
RAM - Right Attacking Midfielder
RB - Right Back
RCB - Right Center Back
RCM - Right Center Midfielder
RDM - Right Defensive Midfielder
RF - Right Forward
RM - Right Midfielder
RS - Right Striker
RW - Right Wing
RWB - Right Wing Back
ST - Striker
xxxxxxxxxxImporting Packages and creating dataframeImporting Packages and creating dataframe
xxxxxxxxxximport pandas as pdimport numpy as npimport seaborn as snsimport matplotlib.pyplot as pltimport warningspd.set_option('display.float_format', lambda x: '%.2f' % x)pd.set_option('display.max_columns', None)pd.set_option('display.max_rows', 200)warnings.filterwarnings('ignore')fifa=pd.read_excel("FIFA.xlsx")df = fifa.copy()xxxxxxxxxxcheck how many rows and columns are in the datasetcheck how many rows and columns are in the dataset
xxxxxxxxxxdf.shapexxxxxxxxxx## Describe the dataset to get the stats of each columnxxxxxxxxxxdf.describe()xxxxxxxxxxSee what the data types are for each columnSee what the data types are for each column
xxxxxxxxxx## Create new Position column to hold other prefered positions if player has more than one, and show that the column was created.xxxxxxxxxxdf['Position'] = df['Preferred Positions'].apply(lambda x: x[:3].strip())pd.set_option('display.max_columns', None)df.head()xxxxxxxxxxcolumns_to_numerical = ['Acceleration', 'Aggression', 'Agility', 'Balance', 'Ball control', 'Composure', 'Crossing', 'Curve', 'Dribbling', 'Finishing', 'Free kick accuracy', 'GK diving', 'GK handling', 'GK kicking', 'GK positioning', 'GK reflexes', 'Heading accuracy', 'Interceptions', 'Jumping', 'Long passing', 'Long shots', 'Marking', 'Penalties', 'Positioning', 'Reactions', 'Short passing', 'Shot power', 'Sliding tackle', 'Sprint speed', 'Stamina', 'Standing tackle', 'Strength', 'Vision', 'Volleys' ]df['Value'] = df['Value'].replace({'K': '*1e3', 'M': '*1e6'}, regex=True).map(pd.eval).astype(int)df['Wage'] = df['Wage'].replace({'K': '*1e3', 'M': '*1e6'}, regex=True).map(pd.eval).astype(int)for col in columns_to_numerical: df[col] = pd.to_numeric(df[col], errors='coerce')xxxxxxxxxxdf.info()xxxxxxxxxx# Reduce dimension by combining position types and statsxxxxxxxxxxposition_cols = [ 'LS', 'ST', 'RS', 'LW', 'LF', 'CF', 'RF', 'RW', 'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'LWB', 'LDM', 'CDM', 'RDM', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB']positiontype_to_cols = { 'Attack': ['LS', 'ST', 'RS', 'LF', 'CF', 'RF'], 'Midfield': ['LW', 'RW', 'LAM', 'CAM', 'RAM', 'LM', 'LCM', 'CM', 'RCM', 'RM', 'CDM', 'RDM', 'LDM'], 'Defense': ['LWB', 'RWB', 'LB', 'LCB', 'CB', 'RCB', 'RB']}for pos_type, colvec in positiontype_to_cols.items(): df[pos_type + '_rate'] = round(df[colvec].mean(axis=1, skipna='true')) summaryname_to_cols = { 'Passing': ['Crossing', 'Short passing', 'Long passing'], 'Shooting': ['Finishing', 'Volleys', 'Free kick accuracy', 'Shot power', 'Long shots', 'Penalties', 'Heading accuracy'], 'Defending': ['Marking', 'Standing tackle', 'Sliding tackle', 'Interceptions'], 'Speed': ['Sprint speed', 'Agility', 'Acceleration', 'Reactions', 'Stamina'], 'Control': ['Ball control', 'Curve', 'Dribbling'], 'GoalKeeping': ['GK diving', 'GK handling', 'GK positioning', 'GK kicking', 'GK reflexes'], 'Mental': ['Composure', 'Vision', 'Aggression', 'Positioning'], 'Power': ['Strength', 'Balance', 'Jumping'], 'Avg_rating': ['Overall', 'Potential'] }for summarycol, colvec in summaryname_to_cols.items(): df[summarycol] = round(df[colvec].mean(axis=1, skipna='true')) df.drop(position_cols, axis=1, inplace=True) cols_to_drop = [colname for colvec in summaryname_to_cols.values() for colname in colvec]df.drop(cols_to_drop, axis=1, inplace=True)xxxxxxxxxxcols_to_remove = ['Photo', 'Flag', 'Club Logo']df.drop(cols_to_remove, axis=1, inplace=True)xxxxxxxxxxdf.shapexxxxxxxxxxdf.describe()xxxxxxxxxx# In the dataframe there are 2029 rows where there is no attack defense or midfield rate.xxxxxxxxxxdf.isnull().sum().sort_values(ascending=False)xxxxxxxxxxdf[df['GoalKeeping'].isnull()]xxxxxxxxxxdf[df['Position'] == 'GK'].isnull().sum().sort_values(ascending=False)xxxxxxxxxx# There are also 2029 Goalkeepers in the data set, and they represent 100% of those 3 null valuesxxxxxxxxxxdf[df['GoalKeeping'].isnull()]xxxxxxxxxxdf.drop(df[df['GoalKeeping'].isnull()].index, inplace = True)xxxxxxxxxxThere are 3 rows missing this stat, and they are goal keepers, so remove these rows.There are 3 rows missing this stat, and they are goal keepers, so remove these rows.
xxxxxxxxxxdf['Mental'] = df['Mental'].replace(np.NaN, df['Mental'].mean())df['Passing'] = df['Passing'].replace(np.NaN, df['Passing'].mean())df['Defending'] = df['Defending'].replace(np.NaN, df['Defending'].mean())df['Control'] = df['Control'].replace(np.NaN, df['Control'].mean())df['Power'] = df['Power'].replace(np.NaN, df['Power'].mean())df['Speed'] = df['Speed'].replace(np.NaN, df['Speed'].mean())df['Shooting'] = df['Shooting'].replace(np.NaN, df['Shooting'].mean())xxxxxxxxxxThere are a few rows that have missing values, and since there are a lot less than the total rows i set them to the mean.There are a few rows that have missing values, and since there are a lot less than the total rows i set them to the mean.
xxxxxxxxxxdf.isnull().sum().sort_values(ascending=False)xxxxxxxxxxdfGK = df[df['Position'] == 'GK']xxxxxxxxxxdfGK.describe(include = 'all')xxxxxxxxxx#Qucik summary of each columndf.describe().Txxxxxxxxxxdef boxplot(column): sns.boxplot(column, showmeans=True, color='yellow') # boxplot will be created and a star will indicate the mean value of the column def histogram(column): sns.distplot(column, color="blue", kde=False)xxxxxxxxxxboxplot(df['Value'])xxxxxxxxxxhistogram(df['Value'])xxxxxxxxxxdf[df['Value'] > 650000].count()xxxxxxxxxxValue is right skewed with some players making drastically more than the average.Value is right skewed with some players making drastically more than the average.
xxxxxxxxxxboxplot(df['Wage'])xxxxxxxxxxhistogram(df['Wage'])xxxxxxxxxxWage is similar to Value in taht it is right skewed.Wage is similar to Value in taht it is right skewed.
xxxxxxxxxxboxplot(df['Special'])xxxxxxxxxxhistogram(df['Special'])xxxxxxxxxxSpecial is slightly left skewedSpecial is slightly left skewed
xxxxxxxxxxboxplot(df['Attack_rate'])xxxxxxxxxxhistogram(df['Attack_rate'])xxxxxxxxxxAttack Rate is slightly left skewed, with a few minorm outliers on either side.Attack Rate is slightly left skewed, with a few minorm outliers on either side.
xxxxxxxxxxboxplot(df['Defense_rate'])xxxxxxxxxxhistogram(df['Defense_rate'])xxxxxxxxxxDefense Rate is close to symetrical, with no outliers.Defense Rate is close to symetrical, with no outliers.
xxxxxxxxxxboxplot(df['Midfield_rate'])xxxxxxxxxxhistogram(df['Midfield_rate'])xxxxxxxxxxdf['Midfield_rate'].describe()xxxxxxxxxxMidfield Rate is symetrical with some outliers on each side of the IQR.Midfield Rate is symetrical with some outliers on each side of the IQR.
xxxxxxxxxxboxplot(df['Passing'])xxxxxxxxxxhistogram(df['Passing'])xxxxxxxxxxdf['Passing'].describe()xxxxxxxxxxPassing is left skewed with outliers on both sidesPassing is left skewed with outliers on both sides
xxxxxxxxxxboxplot(df['Shooting'])xxxxxxxxxxhistogram(df['Shooting'])xxxxxxxxxxShooting is left swkewed with some outliers on the left, probably from people who play positions that dont shoot oftenShooting is left swkewed with some outliers on the left, probably from people who play positions that dont shoot often
xxxxxxxxxxboxplot(df['Defending'])xxxxxxxxxxhistogram(df['Defending'])xxxxxxxxxxDefending is left skewed with no outliers.Defending is left skewed with no outliers.
xxxxxxxxxxboxplot(df['Speed'])xxxxxxxxxxhistogram(df['Speed'])xxxxxxxxxxSpeed is slightly left skewed with outliers on the leftSpeed is slightly left skewed with outliers on the left
xxxxxxxxxxboxplot(df['Control'])xxxxxxxxxxhistogram(df['Control'])xxxxxxxxxxControl is left skewed with outliers on the leftControl is left skewed with outliers on the left
xxxxxxxxxxboxplot(df['GoalKeeping'])xxxxxxxxxxI use a seperate dataframe 'dfGK' below which only has players who ahev the Position 'GK', since goalkeeping stat is relevant only to them.I use a seperate dataframe 'dfGK' below which only has players who ahev the Position 'GK', since goalkeeping stat is relevant only to them.
xxxxxxxxxxboxplot(dfGK['GoalKeeping'])xxxxxxxxxxhistogram(dfGK['GoalKeeping'])xxxxxxxxxxGoalkeeping is symettrical with a few outliers on each sideGoalkeeping is symettrical with a few outliers on each side
xxxxxxxxxxboxplot(df['Mental'])xxxxxxxxxxhistogram(df['Mental'])xxxxxxxxxxMental is left skewed with outliers on the leftMental is left skewed with outliers on the left
xxxxxxxxxxboxplot(df['Power'])xxxxxxxxxxhistogram(df['Mental'])xxxxxxxxxxPower is symetrical with outliers on each sidePower is symetrical with outliers on each side
xxxxxxxxxxboxplot(df['Avg_rating'])xxxxxxxxxxhistogram(df['Avg_rating'])xxxxxxxxxxAverage rating is roght skewed with outliers on each sideAverage rating is roght skewed with outliers on each side
xxxxxxxxxxcorr = df.corr()plt.figure(figsize=(15,10))sns.heatmap(corr, cmap='seismic', annot=True);xxxxxxxxxxAverage Rating is highly correlated with Midfield rate, attack rate, and value, showing that players with high skill in attacking or midfield positions are typicaly rated higher. Attack rate is highly correlated to control and shooting ability, as is midfield.Average Rating is highly correlated with Midfield rate, attack rate, and value, showing that players with high skill in attacking or midfield positions are typicaly rated higher. Attack rate is highly correlated to control and shooting ability, as is midfield.
xxxxxxxxxxPlottting a few teams against each other since i cant do it for every teamPlottting a few teams against each other since i cant do it for every team
xxxxxxxxxxteams = ['Manchester City', 'Barcelona', 'Chelsea FC', 'Juventus']man = df[df['Club'] == 'Manchester City']bar = df[df['Club'] == 'Barcelona']che = df[df['Club'] == 'Chelsea FC']juv = df[df['Club'] == 'Juventus']xxxxxxxxxxsns.pairplot(data=df.drop(['Avg_rating', 'Value', 'Wage'],axis=1))plt.show()xxxxxxxxxxteams = ['Manchester City', 'FC Barcelona', 'Chelsea', 'Juventus']reducedDF = df.loc[df['Club'].isin(teams)]xxxxxxxxxxsns.pairplot(data=reducedDF.drop(['Avg_rating', 'Value', 'Wage'],axis=1), hue='Club')plt.show()xxxxxxxxxxplt.figure(figsize=(15,7))sns.boxplot(reducedDF["Club"],reducedDF["Attack_rate"],palette="PuBu")plt.show()xxxxxxxxxxBarcelona has the highest average attack rate among the 4. The others will want to be mindful when they face them and concentrate on their defense.Barcelona has the highest average attack rate among the 4. The others will want to be mindful when they face them and concentrate on their defense.
xxxxxxxxxxplt.figure(figsize=(15,7))sns.boxplot(reducedDF["Club"],reducedDF["Midfield_rate"],palette="PuBu")plt.show()xxxxxxxxxxplt.figure(figsize=(15,7))sns.boxplot(reducedDF["Club"],reducedDF["Defense_rate"],palette="PuBu")plt.show()xxxxxxxxxxplt.figure(figsize=(15,7))sns.boxplot(reducedDF["Club"],reducedDF["Value"],palette="PuBu")plt.show()xxxxxxxxxxreducedDF[['Value', 'Wage', 'Club']].groupby('Club').sum()xxxxxxxxxxBarcelona spends significantly more money on its players wages than other teams and has the greatest player value. However the other teams have a better ratio of Value/Wage, showing that the extra bit of performance that Barcelonas team members have, comes with a higher cost.Barcelona spends significantly more money on its players wages than other teams and has the greatest player value. However the other teams have a better ratio of Value/Wage, showing that the extra bit of performance that Barcelonas team members have, comes with a higher cost.
xxxxxxxxxxreducedDF[reducedDF['Club'] == 'FC Barcelona'].describe()xxxxxxxxxxreducedDF[reducedDF['Club'] == 'Chelsea'].describe()xxxxxxxxxxreducedDF[reducedDF['Club'] == 'Juventus'].describe()xxxxxxxxxxreducedDF[reducedDF['Club'] == 'Manchester City'].describe()xxxxxxxxxx# Conclusion## Insights and recommendations:FC barcelona out performs most of the other teams in most metrics, having the highest averaage Attack, Defense, and Midfield rates, as well as Passing, Shooting, and Average Player Rating. However their Goalkeeping rating is less than or equal than all but Manchester City's.Chelsea and Manchester having slightly weaker attack scores, so they could benefit from training or rectuiting more in that area. Looking at the team vs value graph, Barcelona has the greatest value, which reflects in its higher performance ratings. In order to compete against them other teams will either have to improve the team members it already has, or try and recruit more skilled players, which could cost more money.FC barcelona out performs most of the other teams in most metrics, having the highest averaage Attack, Defense, and Midfield rates, as well as Passing, Shooting, and Average Player Rating. However their Goalkeeping rating is less than or equal than all but Manchester City's.
Chelsea and Manchester having slightly weaker attack scores, so they could benefit from training or rectuiting more in that area. Looking at the team vs value graph, Barcelona has the greatest value, which reflects in its higher performance ratings. In order to compete against them other teams will either have to improve the team members it already has, or try and recruit more skilled players, which could cost more money.